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Abstract 

The inferential model (IM) framework provides valid prior-free probabilistic in- 
ference by focusing on predicting unobserved auxiliary variables. But efficient IM- 
based inference can be challenging when this auxiliary variable is high-dimensional. 
Here we show that characteristics of the auxiliary variable are often fully observed 
and, in such cases, a simultaneous dimension reduction and information aggrega- 
tion can be achieved by conditioning. This proposed conditioning strategy leads 
to efficient IM inference, and casts new light on Fisher's notions of sufficiency, 
conditional inference, and also Bayesian inference. A differential equation-driven 
selection of a conditional association is developed, and we prove a conditional IM 
validity theorem under some conditions. Some problems, however, may not admit 
a valid conditional IM of the standard form. For such cases, we propose a more 
flexible class of conditional IMs based on localization. The take-away message is 
that the conditional IM framework developed herein provides valid and efficient 
prior- free probabilistic inference in a variety of challenging problems. 

Keywords and phrases: Ancillary; auxiliary variable; Bayes; belief function; 
differential equation; sufficiency; predictive random set; validity. 



1 Introduction 



Fisher's brand of statistical inference ( ]Fisherlll973l ) is often viewed as a middle-ground be- 
tween the Bayesian and frequentist approaches. Two important examples are his fiducial 
argument and his ideas on conditional inference. Perhaps influenced by Fisher's ideas, a 
current focus in foundational research is on achieving some kind of compromise between 
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the Bayesian and f r equentist ideals. See, for example, recent work on fiducial inference 



(Hannig 20 091. l2012uHannig and Ledl2009l ). confidence distributions (jXie and Singhll2012 



Xie et al.ll201lh . Dempster-Shafer theory (jDempsterl 120081 ; IShaferl 1201 IT), and objective 



Baye s with default, reference, and/or data- depen dent priors f|Bergerl l2006t iBerger et al 



20091 ; iFraserl l201lt iFraser et all 120101 ). Recently Martin and Lid f l2012af ) have laid out 



the de tails of a promising n ew in ferential model (IM) approach; see, also, Martin et al. 
( 120101 ) and IZhang and Liul ( 120111 ). IMs take the usual input — sampling model and ob- 
served data — and produce prior-free, posterior-probabilistic measures of certainty about 
any assertion/hypothesis of interest, with an almost automatic calibration property. The 
fundamental idea is that uncertainty about the parameter of interest 9, given observed 
data X = x, is fully characterized by the unobserved value u* of an associated auxiliary 
variable U. So the problem of inference about 9 can be translated into one of predicting 
this unobserved value u* with a predictive random set. In Section [2] we briefly review the 
construction and basi c properties of IMs. 

The discussion in Martin and Liul ( j2012af ) focuses on the case where 9 and u* are of 
the same dimension. But there are many problems, e.g., iid data from scalar parameter 
models, where the dimension of the auxiliary variable is much greater than that of the 
parameter. In such cases, efficiency can be gained by first reducing the dimension of the 
auxiliary variable to be predicted, though it is not at all obvious how to perform this 
dimension reduction in general. In this paper we focus our attention on an auxiliary 
variable dimension reduction step based on conditioning. The critical observation here 
is that, typically, certain functions of the auxiliary variables are fully observed. So, by 
conditioning on those observed characteristics of the auxiliary variable, we can effectively 
reduce the dimension of the unobserved characteristics to be predicted. The fundamental 
result, proved in Section I3T2"} is that this reduction is accomplished without loss of infor- 
mation. Therefore, we can view this dimension-reduction approach as a tool for combin- 
ing information about 9 across samples — a counterpart to Bayes' theorem and Fisher's 
sufficiency. From the resulting lower- dimensional auxiliary variable representation, we 
proceed to construct what is called a conditional IM. In Section 13. 4[ we give a general 
validity theorem that establishes a desirable calibration property of the conditional IM, 
which helps facilitate a common interpretation across users and experiments. 

Finding the dimension-reduced representation, the subject of SectionHJ is sometimes a 
familiar task. For example, when the minimal sufficient statistic has dimension matching 
that of the parameter, the conditional IM is exactly that obtained by working directly with 
said statistic. In other cases, finding the lower-dimensional representation is not so simple, 
analogous to finding ancillary statistics in the classical context. For this, we propose 
a new differential equation-driven technique for identifying observed characteristics of 
the auxiliary variable. Two classical conditional inference problems are worked out in 
Section El one showing how the proposed differential equation technique leads to an 
additional dimension reduction beyond what ordinary sufficiency provides. So, besides 
the development of conditional IMs, the proposed framework also casts new light on the 
familiar notion of sufficiency, as well as Fisher's attractive but elusive ideas on ancillary 
statistics, conditional inference, etc. 

In some cases, however, it may not be possible to produce a valid conditional IM with 
these somewhat standard techniques. For this, in Section [6j we propose an extension 
of the conditional IM framework which allows the lower- dimensional auxiliary variable 
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representation to depend on 9 in a certain sense. We refer to these as local conditional IMs, 
and we describe their construction and prove a validity theorem. An important example 
of such a problem is the bivariate normal model with known means and variances but 
unknown correlation. For this example, we construct a local conditional IM based on a 
modification of the differential equations technique, and provide the results of a simulation 
study that shows that our conditional plausibility interv als outperform the class i cal r*- 
drive n asymptotically approximate confidence intervals (jBarndorff- Nielsen! Il986l : iFraser 



19901 ) in both small an d large sam ples. A local conditional IM analysis of the variance- 



components problem in I Cox fll980h is also given. 

We conclude in Section [7] with some remarks. In particular, we highlight the main 
contributions of this paper and discuss some potential extensions of these important ideas 
and results, including a related IM strategy for nuisance parameter problems. 



2 Review of IMs 

2.1 Notation and construction 

To fix notation, let X be the observable data, taking values in a space X, and let 9 be the 
parameter of interest, taking values in the parameter space 0. The starting point of the 
IM framework is similar to that of fiducial, in the sense that an auxiliary variable, denoted 
by U and taking values in a space U with probability measure Pu, is associated with X 
and 9. It is this association, together with the distribution U ~ Pu, that characterizes 
the sampling distribution X ~ Pxw- 111 particular, if we write this association as 

X = a(9,U), U~P V , (2.1) 

then we require that X generated according to the above "algorithm," i.e., first sample 
U ~ Pu and set X = a(9, U) for given 9, have distribution Px\e- 

Compared to fiducial inference, which employs the sampling distribution after 
X = x is observed, the IM approach takes a different perspective. Specifically, the IM 
approach treats the unobserved value u* of U, which is tied to the observed data X = x 
and the true value of 9, as the fundamental quantity. Then the goal is to predict this 
unobserved value u* with a random set bef ore conditioning on X = x and inverting (12.11) . 



Here we follow Martin and Liul (12012a ); see their paper for full details. Start with a 
collection 8 of P;y-measurable subsets of U, assumed to contain and U. This collection 
will serve as the support of the predictive random set. For optimal predictive random 
sets, it suffices to assume that the collection § is nested, i.e., either S C S' or S' C S 
for all S, S' G §. We can now define the predictive random set S, supported on S, with 
distribution P$ satisfying 

Ps{S C K} = sup Pu(S), KCV. 

S&S-.SCK 

Ps{<S Q ■} is like the "distribution function" of the random set S. Predictive random sets 
constructed in this way are called admissible. In scalar 9 problems, Pu is often Unif(0, 1), 
so an important example of an admissible predictive random set is 

S = {u: |u-0.5| < |C/-0.5|}, U ~ Unif(0, 1). (2.2) 
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Martin and Lid f l2012al . Corollary 1) show that this S has a variety of good properties, 
and these good properties often carry over to the corresponding IM, provided that the set 
©x(w) = {9 : x = a(0,u)} moves monotonicall}0 as a (set- valued) function of u for each 

x. We shall also employ this "default" predictive random set in our example s herein. 

The following three steps — association, predict, and combine — described in lMartin and Liu 
fl2012ah . together define an IM. 

A-step. Associate X, 9, and U ~ Pu, consistent with the sampling distribution X ~ Pxw, 
such that, for all (x, u), there is a unique subset Q x (u) = {9 : x = a(9, u)} C O, possibly 
empty, containing all possible candidate values of 9 given (x, u). 

P-step. Predict the unobserved value u* of U associated with the observed data by an 
admissible predictive random set S. 

C-step. Combine S and the association Q x (u) specified in the A-step to obtain 



e x (s) = |J e x (u). 



Then compute the belief function 



be\ x {A;S) = P s {e x (S) C A \ G X (S) ^ 0}, 



(2.3) 



(2.4) 



where A C O is the assertion/hypothesis about 9 of interest. 

The belief function is just one part of the inferential output. Since the belief function 
is sub-additive, i.e., be\ x (A;S) + be\ x (A c ; S) < 1, one actually needs both be\ x (A;S) and 
bela;(yl c ; S) to summarize the information in x concerning the truthfulness of assertion A. 
In some cases, it is more convenient to report the plausibility function 



p\ x (A;S)^l-be\ x (A c ;S). 



(2.5) 



Then the pair (bel^, p\ x )(A;S) characterize the IM output. Note that there are reasons 
one might co nsider using a di f ferent predi ctive random set fo r each of bel^A; •) and 



bela;(A c ; •); see Martin and Liul ( 2012a ) and Martin et al. ( 2012 ). These two papers also 



provide a variety of examples illustrating the construction of IMs. 

Without practical loss of generality, assume that {Px\9 '■ 9 6 G} has a common 
dominating measure, say \i. Then we require that be\ x (A;S) be a /i- measurable function 
in x for all A. This is easy to check in examples, but general sufficient conditions are more 
elusive. To keep presentation simple, we shall mostly ignore these technical concerns. 



2.2 Validity of IMs 

The performance of a predictive random set is measured through the sampling behavior 
of the corresponding belief function, as a function of X ~ Px\9, for a given assertion A. 
Given S, the corresponding IM is valid for A if the belief function satisfies 

svpPe{be\ x {A;S) > I -a] <a, V cue (0,1). (2.6) 

1 A set-valued mapping Q x {') moves monotonically right (left) if, for 1*2 > u\, for any 9 £ x (iti), 
there exists a 0' € Q x {u 2 ) such that 0' > (0' < 0). 
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The IM is simply called valid if it is valid for all A. In other words, the IM is valid for 
A if be\x(A;S) is stochastically no larger than Unif (0, 1) when X ~ P x \e with 9 £ A. 
That is, if A is false, then the amount of su pport in data X for A will be large only 
for a relatively small proportion of X values. iMartin and Liu! (|2012al . Theorem 1) show 
that this validity property is easy to arrange: it holds for all A whenever the predictive 
random set S is admissible in the sense described above. In this case, if the IM is valid 
for all A, then (12.61) can be equivalently stated in terms of the plausibility function: 

su P Px|e{plx(^;5) < a} < a, V«e(0,l). (2.7) 

eeA 

This formulation is occasionally more convenient than (12. 6p . 

There are two important consequences of the validity theorem. First, i t helps deter- 



mine a n objective scale on which the belief probabilities can be interpreted. Martin et al. 



(120121 ) discuss notions of meaningfulness within and across experiments, and they argue 
that the validity theorem is critical to having both these interpretations simultaneously. 
Bayesian, fiducial, and Dempster-Shafer probabilities are subjective and, therefore, unlike 
valid IMs, they do not have a common scale on which they can be interpreted. Second, 
if one so chooses, the validity theorem allows one to use the IM output to construct fre- 
quentist decision procedures with control on error rates. For example, one can construct 
a 100(1 — a)% plausibility region for 9: 

{9:p\ x (9;S)>a}. (2.8) 

It follows easily from (12. 7p that this plausibility region has nominal 1 — a coverage proba- 
bility. But we should emphasize here that, although plausibility functions can be used to 
construct frequentist procedures, the interpretation is quite different. For example, the 
plausibility region is understood as the collection of points such that each is individually 
sufficiently plausible, given X — x. Confidence/credible regions, on the other hand, do 
not have such a simple yet sharp interpretation. 



3 Conditional IMs 



3.1 Motivation 



Most of the examples in lMartin and Liul (12012al ) have a scalar auxiliary variable U. This 



makes construction of efficient predictive random sets relatively easy. However, scalar 
auxiliary variables is an extremely special case. To see this, suppose Xx, . . . ,X n , n > 1, 
are independent N(0, 1) observations with common unknown mean 9. In vector notation, 
an association is X = 91 n +U, where l n is an n-vector of unity, and U ~ N„(0, 1). Without 
careful thinking, it seems that one must predict an n-dimensional auxiliary variable u*. 
But efficient prediction of u* would be challenging if n is even moderately large, so 
reducing the dimension of u* — ideally to one dimension — would be a desirable first step. 
In a classical framework, one can avoid this dimensionality difficulty by reducing X to a 
sufficient statistic, e.g., X, and construct an association from there. In this and the next 
section, we develop an IM-based theory that justifies this sort of intuition. 
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3.2 Dimension reduction via conditioning 



As indicated in Section I3.1[ efficient prediction of the unobserved auxiliary variable can 
be difficult in moderate- to high-dimensions. Here we investigate an approach by which 
a simultaneous aggregation of information and auxiliary variable dimension reduction 
can be achieved. The intuition is that some functions of u* are actually observed, so 
these characteristics do not need to be predicted. This provides a sort of aggregation 
of information. Furthermore, these observed characteristics also provide a dimension 
reduction which can help to better predict those aspects that remain unobserved. The 
general strategy is as follows: 

• Identify an observed characteristic of the auxiliary variable whose distribution is 
free (or at least mostly free) of 9, and 

• define a conditional association that relates a dim(0)-dimensional function of the 
auxiliary variable to 9 and some function of observable data X. 

The second step is familiar, as it relates to working with, e.g., minimal sufficient statistics 
of dim(B) dimension. The first step, however, is less familiar and can be difficult; see 
Section HI In Theorem [1] below, we show that this auxiliary variable dimension- reduction 
scheme can be accomplished without loss of information. "Information" has a precise 
meaning in the classical setting, via likelihood. Our framework is different, so we must 
first explain what is meant by "without loss of information" in this setting. 

In the theorem that follows, a naive IM is that based on a singleton predictive random 
set. For example, in the baseline association, the naive IM uses S = {[/} with U ~ Pjj. 
This is a poor choice of predictive random set from a practical point of view, but it is 
a convenient choice when the goal is to compare two candidate associations, especially 
when the auxiliary variable spaces are quite different. Given observation X — x, the 
triplet (x, a, Pjy) is all that is needed to evaluate the naive IM's belief function. 

Theorem 1. Suppose that the relationship x = a(u,9) in the baseline association (12. ip 
can be decomposed into the system: 



where x i— >■ (T(x),H(x)) and u h-> (iPt{u),iPh{u)) are one-to-one and free of 0. Let 
(Vp, Vh) £ Vy x Vh be the image of U under (i^t^h), and let Py T \h be a version of the 
conditional distribution ofVr, given Vh = h, h e H(K). Then the baseline association 
(12. ip and that determined by T(x) = ot{vt, 9) and Pv T \H(x) are equivalent for inference on 
9 in the sense that, for any observed X = x, the triplets (x, a, Pjj) and (T(x), ax, Pv t \h(x)) 
produce identical naive IM belief functions. 

The decomposition (13. ip boils down to a specification of a particular hierarchical 
representation of the sampling model for X. Indeed, for functions H and T as in the 
theorem, with Vh = iPh(U), and Vr = ipr(U), data X ~ Px\e can be simulated as follows. 

1. Sample (V T , V H ) by sampling V H ~ Py H and V T \V H ~ Pv T \v H i 

2. Obtain X by solving the system H(X) = V H and T(X) = a T (V T , 9). 




(3.1b) 
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This hierarchical model representation provides the following insight: when X = x is 
observed, so too is the value of Vh, and this knowledge can be used to update the 
auxiliary variable distribution, analogous to Bayes' theorem. Hence, Theorem [1] provides 
a sort of aggregation of information. Further extensions and consequences of Theorem Q] 
are collected in a series of remarks in Section 13.31 

Another important consequence of Theorem [1] is as follows. Let Vr = ipr{U), taking 
values in Vr = ^t(U). Then the theorem states that it suffices to consider a new con- 
ditional association that connects transformed data T(x), transformed auxiliary variable 
Vr-, and parameter 9 through the mapping 

T{x) = a T (v T , 9), T(x) G T(X), v T G V T , (3.2) 

which seems to ignore the first equation (I3.1al) . However, on the contrary, (13 .lap can 
never be ignored, because it is used to update the auxiliary variable distribution from Pjj 
to Pv t \h(x)- If h happens that Vr and Vh are independent, then the first constraint (13. lap 
actually plays no role. The important point is not the simple recasting of the association 
that is important; rather, it is that ipr can often be chosen so the new auxiliary variable 
Vr is of lower dimension than U. In fact, Vr will often have dimension the same as that 
of 9. In addition to providing a sort of summary of the data, like in the classical context, 
this auxiliary variable dimension reduction has a unique advantage in the IM context: 
efficient predictive random sets for the lower- dimensional Vr are easier to construct. 

Once a decomposition (13. ip is available, construction of a new conditional IM follows 
exactly as in Section [TJ To simplify the presentation later on, here we restate the simple 
three-step construction of a conditional IM. 

A-step. Associate T(x) and 9 with the new auxiliary variable Vt = ^t{ u ) to get the 
collection of sets Qt(x){ v t) = : T(x) = o>t(v t ,9)}, vt G Vt, based on (13.21) . 

P-step. Fix h = H(x). Predict the unobserved value v T of Vt with a conditionally 
admissible predictive random set S ~ (see Section I3~^l) . 

C-step. Combine results of the A- and P- steps to get 

e r( x)(5) = |J &t( x) (vt) C 6. (3.3) 

VT&S 

Then the corresponding conditional belief and plausibility functions are given by 

be\ T(x)lh (A;S) = P 5 | h {e TOe) (S) C A \ e T(x) (s) ^ 0} 
P \ T{x)lh (A;S) = l-be\ T{x)lh (A c ;S). [ ' } 

These functions can be used for inference on 9 just like those in Section [2J 
3.3 Remarks 

Remark 1. Theorem [T] holds for more general decompositions (13.11) . That is, one may 
replace u H(x) = i/jh{u)" in (I3.1al) with u c(x, u) = 0" for a function c. However, this 
more general "non-separable" case does not fit into the context of the conditional va- 
lidity theorem; see Theorem [2j So although (13.1 aft is not necessary for Theorem [I], the 
more general version is dangerous since the corresponding IM may not have the proper 
calibration properties. We will have more to say about this in Section [61 
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Remark 2. In some cases, it will be convenient to rewrite the conditional association ( 13. 2 p 
to absorb the dependence on h = H(x) into the function ar, so that the auxiliary variable 
and predictive random set can have a fixed distribution, independent of observed x. That 
is, (13. 2p may be rewritten as 

T(X) = ar(W,9,h), W ~ P w , 

where Pw is free of both 9 and h. For example, if Vr is one-dimensional, let be its 
distribution function. If this distribution is continuous, then W = F^Vt) ~ Unif (0, 1); 
in that case, set aT(w,9,h) = aT(F l ^ 1 (w),9). See Section[5j 

Remark 3. An immediate consequence of Theorem [TJ is that, if two different decomposi- 
tions of the form (13. ip are available, then the two corresponding naive conditional IMs are 
equivalent in the sense that their belief functions are identical. Therefore, although dif- 
ferent baseline associations could be chosen, and a variety of different ways to construct a 
decomposition (13 .ip are available, there is still a notion of conditional IM uniqueness: the 
belief functions based on singleton predictive random sets are identical. This equivalence 
can be extended beyond the singleton predictive random set case whenever one condi- 
tional IM is a one-to-one reparametrization of the other. Basically, our type of conditional 
IM uniqueness stems from the two-step hierarchical representation of the sampling model 
given above. That is, any decomposition ( 13. ip specifies a hierarchical representation, and 
since these are all equivalent, so too are the corresponding conditional IMs, modulo choice 
of predictive random set. 

Remark 4. There are clearly some close connections between the result in Theorem [1] 
and Fisher's notion of sufficiency At a very high level, both theories provide a sort of 
dimension reduction. The key difference between the two is that sufficiency focuses on 
reducing the dimension of the observable data, while Theorem [1] focuses on reducing 
the dimension of the unobservable auxiliary variable. Although the conditional IM can, 
in some cases, correspond to a sufficient statistic-type of reduction but, in light of the 
equivalence in Remark [3j this is not necessary. In this sense, sufficiency, and the related 
notions of completeness, minimal sufficiency, etc, are not fundamental concepts in the 
IM framework. Moreover, in the classical framework, conditional inference (i.e., restrict- 
ing sampling distributions to relevant subsets based on ancillary statistics) is somewhat 
elusive, but the idea is crystal clear in the conditional IM framework; see Theorem [2J 

Remark 5. As we mentioned previously, conditional IMs and Theorem [T] have some con- 
nections to Bayes' theorem, in particular, in how information is combined or aggregated 
across samples. In fact, it can be shown that, in a certain sense, the Bayes solution is a 
special case of conditional IMs. To see this, consider a simple but generic example. The 
Bayes model, cast in terms of associations, is of the following form: 

9 = U , U ~P Uo and X = a(U , U t ), U x ~ P^, 

where P\j for U = (U , U\) is such that U\ is conditionally independent given U . Here P^ 
is like the prior, and the distribution induced by u\ t— > a(9, ux) given Uq = 9 determines 
the likelihood. It is clear that the function a(Uo, U\) is fully observed, so the conditional 
IM strategy would employ the conditional distribution of U$ given the observed value x 
of a(Uo,Ui). It is a simple exercise to see that the belief function in Theorem [1] — the 
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one based on a "naive" IM — is exactly the Bayes posterior distribution function. So in 
any problem with a known prior distribution, the Bayes solution can be obtained as a 
special case of the conditional IM. No non-naive predictive random set is needed here 
because the naive IM itself is valid; this is consistent with the simple corresponding fact 
for posterior probabilities under a Bayes model with known prior. 

Remark 6. As a follow-up to Remark [51 since a full prior is not required to construct a 
conditional IM, it is possible to develop an inferential framework based on conditional IMs 
and "partial prior information." For example, valid prior information may be available 
for some but not all components of 9. Incorporating the prior information where it is 
available while remaining prior-free where it is not can be obtained by slight extension 
of the argument in the previous remark. This important application of conditional IMs 
deserves further investigation. 



3.4 Validity of conditional IMs 



Here we extend the validity results in Martin and Liul (l2012al ) to the conditional IM 



context. The main obstacle is that the distribution function P$, determined by the 
conditional distribution Pv T \H(x) m TheoremHJ depends on data through the value H{x). 
This is handled in Theorem [2] below by conditioning on the observed value of H(X). 

Fix h G H(X), and let E>h be a collection of Py T ^-measurable subsets of V T . To keep 
notation simpler, assume that E>h contains both and Vy. Like before, we also assume 
that E>h is nested in the sense that either S C S' or S' C S for all S, S' G S^. Now we say 
that S is a conditionally admissible predictive random set, given h, if the support E>h is 
nested and if its distribution Ps\h satisfies 

P s \ h {SCK}= sup P VT \h{S}, KCY T . (3.5) 

SeS h :SCK 

So, in this case, the distribution of S depends on the particular h. With the help of 
Lemma Q] in Appendix [Aj, we have the following extension of the validity theorem to the 
case of conditional IMs. 

Theorem 2. For any h, suppose that S is conditionally admissible, given h, with dis- 
tribution Ps\h as i> n (13.51) . If Qt(x){<S) 7^ with P s\h-P r obability 1 for all x such that 
H(x) = h, then the conditional IM is conditionally valid, i.e., for any A C 0, 

sup Px\e{be\ T (x)\h(A; 5) > 1 — a | H(X) = h} < a, VaG(0,l). (3.6) 

Now is a good time to recall Remark [U More general decompositions of the baseline 
association are allowed in Theorem [H but only for the "separable" version (13. lap is 
it possible to prove a conditional validity theorem. The point is that a condition like 
c(X, U) = does not identify a fixed subset of the sample space on which probability 
calculations can be restricted — the subspace would depend on U . 

Since the calibration property in Theorem [2] holds for all assertions A, we may translate 
(13. 6p to a statement in terms of the corresponding plausibility function: 

supPx|fl{plr(W^;5)<a|^(X) = /i}<a, V«g(0,1). (3.7) 
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So, in addition to providing an objective scale for interpreting the conditional belief and 
plausibility function values, (13 .7]) provides desirable properties of conditional IM-based 
frequentist procedures. For example, if h = H(x) is observed, the conditional 100(1 — a)% 
plausibility region for 9 is {9 : plr( x )|h(^i > a l- Then, by (13. 7p . the conditional coverage 
probability is Px\e{p^T(x)\h(^ > a I ^{X) = h} > 1 — a. In Fisher's mind, this is 
a more meaningful coverage probability since it is conditioned on a particular aspect of 
the observed data, namely, H{x) = h. In other words, the probability calculation focuses 
on a "relevant subset" {x : H(x) = h} of the sample space. In some cases, though, 
conditional validity is the same as ordinary validity. 

Corollary 1. Suppose that the predictive random set S does not depend on the observed 
H(x) = h, so that Ps\h = P<s and ^T(x)\h = b^Tfx)- Then under the conditions of 
Theorem^ the conditional IM is unconditionally valid, i.e., for any ACQ, 

supP X | {bel T(x) (A;<S) > 1 - a} < a, VaG (0,1). 

Two possible ways the condition of Corollary [TJ may hold are as follows. First, in the 
P-step, the user may specify S directly without dependence on the observed H(x) = h; 
see Section 15.11 Second, it could happen that Vr and Vh are statistically independent, 
in which case the distribution for S is determined by the marginal distribution of Vr, 
which does not depend on h. 

4 Finding conditional associations 

4.1 Familiar things 

In many problems, finding a decomposition (13. ip in Theorem [1] and the corresponding 
conditional association is easy to do. In general, the Neyman-Fisher factorization the- 
orem implies we can define a conditional association through the marginal distribution 
of the minimal sufficient statistic T(x). In standard problems, such as full-rank expo- 
nential families, the minimal sufficient statistics are easily obtained so this is probably 
the simplest approach. This, of course, includes both discrete and continuous problems. 
Similarly, if the problem has a group structure, invariance considerations can be used to 
find a decomposition; see Section 15. 1[ Note, however, that in light of Remark [31 one can 
consider other conditional associations if desirable. When the minimal sufficient statistic 
has dimension larger than that of the parameter, e.g., in curve exponential families, then 
some special conditioning is required; see Section 15.21 

4.2 A new differential equations-based technique 

Here we describe a novel technique for finding conditional associations, based on differen- 
tial equations. The method can be used for going directly from the baseline association to 
something lower-dimensional. In fact, in those nice problems mentioned above, it is easy 
to check that this differential equation-based technique reproduces the solutions based on 
minimal sufficiency, group invariance, etc. However, in our experience, this new approach 
is especially powerful in cases where the familiar things fail to give a fully satisfactory 
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reduction. In such cases, the differential equation-based technique can provide a further 
dimension reduction, beyond what sufficiency alone can give. 

For concreteness, suppose 0CK; the multi-parameter case can be handled similarly. 
The intuition is that ipT should map U C W L to G, so that Vp = ipr{U) is one-dimensional, 
like 9. Moreover, ipjj should map U into a (n — l)-dimensional manifold in M n , and be 
insensitive to changes in 9 in the following sense. For baseline association x = a(9,u), 
suppose that u Xj g is the unique solution for u. Then for fixed x, we require that ipH(u Xt g) 
be constant in 9. In other words, we require that du X) g/d9 exists and 



_ dvp H (ux,e) _ dip H (u) 



nxi 89 du 

n X n, rank n — 1 



du x $ , , „ . 

' -M' (41) 



nx 1 



It is clear from the construction that, if a solution ijjjj of this (constrained) partial differen- 
tial equation exists, then the value oiipniU) is fully observed, i.e., there is a corresponding 
function H, not depending on 9, such that H(X) = if) 2 (U). So, with appropriate choice 
of i/jt, the solution ipn of (14.11) determines the decomposition (13.11) in Theorem [TJ 

Formal theory on existence of solutions and on solving the differential e quation system 



(14.11) is available. For example, the method of characteristics described in iPolvanin et al. 



(120021 ) is powerful tool for solving such systems. However, such formalities here will take 



us too far off track. Examples of this method in action are given in Section 15. 2} 16. 4} and 
16.51 In the first two cases, this differential equations method is applied after an initial 
step based on sufficiency, etc, provides an unsatisfactory dimension reduction. 



5 Two detailed examples 
5.1 A Student-t location problem 

Suppose Xi, . . . , X n is an independent sample from a Student-t distribution t u (9), where 
the degrees of freedom v is known but the location 9 is unknown. This is a relatively 
challenging problem from a classical point of view because there is no satisfactory reduc- 
tion via sufficiency. For the IM approach, start with a baseline association X = 91 n + U, 
with U = (Ui, . . . , U n ) T and Ui ~ t u , independent, for i = 1, . . . , n. For this location 
parameter problem, invariance considerations suggest the following decomposition: 

X - T(X)l n = U - T(U)l n and T(X) = 9 + T(U), 

where T(-) is the maximum likelihood estimator. Let Vr = T(U) a nd Vr = H(U) 



U—T (U)l n . If h is the observed H(X), then it follows from the result of lBarndorff-Nielsen 



(119831 ) that the conditional distribution of Vr, given Vh = h, has a density 

n 

UM = C(U, h) 1[{U + (V T + htfy^ 2 , 



i=l 



where c(u, h) is a normalizing constant that depends only on v and h. If we write F u ^ 
for the distribution function corresponding to the density f u ^ above, then a conditional 
IM for 9 can be built based on the following association (cf. Remark [2]): 

T(X) = 9 + F-l(W), W~Unif(0,l). 
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Coverage probability 
v 



Expected length 
v 



Method n 3 5 10 25 3 5 10 25 

CIM 5 0.944 0.949 0.951 0.949 2.28 2.08 1.93 1.83 

10 0.949 0.951 0.952 0.953 1.56 1.45 1.35 1.29 

25 0.953 0.944 0.951 0.949 0.97 0.91 0.85 0.81 

50 0.953 0.951 0.953 0.947 0.68 0.64 0.60 0.58 

MLE 5 0.915 0.933 0.944 0.940 2.10 1.98 1.88 1.81 

10 0.938 0.942 0.944 0.946 1.50 1.42 1.34 1.28 

25 0.946 0.939 0.943 0.940 0.96 0.90 0.85 0.81 

50 0.950 0.941 0.944 0.938 0.68 0.64 0.60 0.58 



Table 1: Coverage probabilities and expected lengths of the 95% plausibility /confidence 
intervals for 9 in the Student-t example based on, respectively, the conditional IM (CIM) 
and asymptotic normality of the maximum likelihood estimate (MLE). 

With this conditional association, we are ready for the P- and C-steps. For simplicity, 
in the P-step we elect to take the predictive random set S as in (12. 2ft ; this also has 
some theoretical justific ation since f v & should be approximately symmetric about Vt — 



( Martin and Liull2012aL Sec. 4.3.2). For the C-step, the random set Qt( x )(S) is 

[T(x) - F~l(0.5 + \ W - 0.5\) , T(x) - F~l(0.5 - |W-0.5|)], W ~ Unif(0, 1). 

From this point, numerical methods can be used to compute the conditional belief and 
plausibility functions. For example, if A = {9} is a singleton assertion, then 

p\ Tix)lh (9;S)=l-\l-2F u , h (9-T(x))\, 

and the corresponding 100(1 — a)% plausibility interval for 9 is 

{9 : pl^),^; S) > a} = (T(x) + F^{a/2), T(x) + i^(l - a/2)). 

For illustration, we present the results of a simple simulation study. In particular, for 
several pairs (n, v), 5000 Monte Carlo samples of size n are obtained from a Student-t 
distribution with v degrees of freedom and center 9 = 0. For each sample, the 95% 
plausibility interval for 9 based on the conditional IM above is obtained. For comparison, 
we also compute the 95% confidence interval based on the asymptotic normality of the 
maximum likelihood estimate. The results of this simulation are summarized in Table [TJ 
The general message is that while the classical confidence intervals are a bit shorter than 
the conditional IM plausibility intervals on average, the former tend to undershoot the 
target coverage probability while the latter are typically on target. 

To conclude this example, recall our argument for conditional IM uniqueness in Re- 
mark [3j Here we can make a stronger statement. In the Student-t simulation example 
above, we also did the calculations with an alternative decomposition which took Vt = U\ 
and Vh — (0, U2 — U\, . . . , U n — U\). Although Remark [3] suggests it, we were surprised to 
see that the results obtained with this "naive" decomposition were indistinguishable from 
those based on the arguably more reasonable maximum likelihood-driven decomposition. 
This suggests that the choice of decomposition does not affect the final results, provided 
that the conditioning is done correctly. 
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5.2 Fisher's problem of the Nile 



Suppose two independent exponential samples, namely X\ = (Xn, . . . ,Xi n ) and X 2 = 
(X 2 i, . . . ,X 2n ), are available, the first with mean 9~ l and the second wi th mean 9. T he 
goal is to make inference on 9 > 0. The name comes from an application (lFisherlll973l ) to 
fertility of land in the Nile river valley. In this example, the maximum likelihood estimate 
is not sufficient, so conditioning on an ancillary statistic is recommended. 

Sufficiency considerations suggest the following initial dimension reduction step: 

S(X 1 ) = 9- x U x and S(X 2 ) = 9U 2 , U u U 2 ~ Gamma(n, 1), 

where S(Xj) = YTj=i But efficiency can be gained by considering a further reduction 
to a scalar auxiliary variable. Here we employ the differential equation technique in 
Section H~2"l Start with u Xt g = (9S(xi), 9~ 1 S(x 2 )) T . Differentiating with respect to 9 
reveals that our (real valued) conditioning function ipu must satisfy 



dip H (u) 



du 

If we take iPh{u) = {uiu 2 } 1 ! 2 , then 
dijj H {u 



u=Ux,e 



-9~ 2 S(x 2 ) 



0. 



du 



(9- 1 S(x 2 ),9S(x 1 )) 



= Ux , a 2{S(x 1 )S(x 2 )y/i 

and, clearly, this satisfies the differential equation above. Therefore, for (13. ip . we take 

H(X) = V H and T(X) = 9V T , (5.1) 

where T(X) = {S(X 1 )/S(X 2 )} 1 / 2 , H{X) = {S(X 1 )S(X 2 )} 1 / 2 , V T = {U x /U 2 y/ 2 , and 
Vh = {UiUq} 1 / 2 . These quantities are familiar from the classical approach: T(X) is the 
maximum likelihood estimate of 9, H( X) is an ancillary st atistic, and the pair (T, H)(X) 
is a jointly minimal sufficient statistic ( IGhosh et al.ll2010l ). 

By Theorem [TJ and (15.11) . we can focus on a conditional association based on T(X) = 
9Vt- The co nditional distribution of Vp given Vh = h is a generalized inverse Gaussian 
distribution ( IB arndorff- Nielsen! 1 1 9 771 ) with density function 



fhM 



2v T K (2h) 



exp{-/i(f T 1 + v T )}, 



(5.2) 



where Kq is the modified Bessel function of the second kind. As a final simplifying step 
(cf. Remark [2]), write the conditional association as 



T(X) = 9F^ 1 (W), W~Unif(0,l) 



(5.3) 



where Fh is the distribution function corresponding to the density fh in (15.21) . This 
completes the A-step. If we take S as in (12. 2p for the P-step, then the C-step gives 



T(x) 



T(x) 



\-F- L (0.5 + \W - 0.5|) ^(0.5 - \W - 0.5|) 



W ~ Unif(0, 1). 
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0.6 0.8 1.0 1.2 1.4 0.6 0.8 1.0 1.2 1.4 

9 e 



(a) h=15 (b) h = 25 

Figure 1: Plausibility functions for the conditional IM (black) and the "naive" conditional 
IM (gray) in the Nile example, with T = 0.90, n = 20, and the true 6 = 1. Gray curves 
in the two plots are the same since the naive conditional IM does not depend on h. 



From this, the conditional belief /plausibility functions are readily evaluated. 

For illustration, we display plausibility functions p\ t (6;S) for two conditional IMs. 
The first is based on that derived above; the second is based on a similar derivation, but 
we ignore Vh and simply work with the marginal distribution of Vr in (15. 1ft . Figure [I] 
shows plausibility functions for T(x) = 0.90, with n = 20 and true 6 = 1, sampled from 
its conditional distribution given h, for two different values of h. In this case, if h is 
large (i.e., h > n), then the bona fide conditional IM has narrower level sets than the 
naive conditional IM. The opposite is true when h is small (i.e., h < n). This is due to 
the fact that the c onditional Fisher information in T is an increasing function in h; see 



Ghosh et al.l (120101 . Example 1). Therefore, T has more variability when h is small, and 



this adjustment should be reflected in the plausibility function. The bona fide conditional 
IM catches this phenomenon while the naive one does not. 



6 Local conditional IMs 



6.1 Motivation 



So far we have seen that the conditional IM approach is successful in problems where the 
baseline association admits a decomposition of the form (13. lap . However, as alluded to 
above, there are interesting and important problems where apparently no such decom- 
position exists. Next is one such problem, which ma y be considered as a "benchmark 



example" for conditional inference (IGhosh et al.ll2010L Example 5). 



Suppose (Xu, X21), • • • , (Xi n , X 2 n) is an independent sample from a standard bivari- 
ate normal distribution with zero means, unit variances, but unknown correlation coeffi- 
cient 6 G (—1, 1). A natural first step towards inference on 6 is to take advantage of the 
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fact that X\ + X 2 and X\ — X2 are independent. In particular, by defining 

n 1 71 

X 1 ^-J2(Xu + X 2i ) 2 and X 2 ^-Y,{X u -X 2l )\ 

i=l i=l 

we may rewrite the baseline association as 

X 1 = (l + 6)Ui and X 2 = (1-9)U 2 , U u U 2 ~ ChiSq(n). (6.1) 
Sufficiency justifies this first reduction. Equation (16.11) is equivalent to 

Xi X 2 , X l I + 6U1 , s 

ui + wr 2 and xr—em- (6 - 2) 

The first equation depends on data and auxiliary variable — free of 9 — while the second 
depends also on 9. But note that the second expression in (16.21) is not of the form 
specified in (13. lap . In fact, this first expression is of the more general "non-separable" form 
c(X, U) = described in Remark [TJ So, although (16.21) provides a suitable decomposition 
of the baseline association, the requirements of Theorem [2] are not met, so the resulting 
conditional IM may not be valid. 



6.2 Relaxing ( 13. lap via localization 

As describe above, the separability in (I3.1al) can be too strict, but extending the condi- 
tional validity theorem to allow non-separablility appears difficult. The idea here is to 
relax (13. lap in a different direction. Specifically, we propose to allow the pair of function 
(H, ipn) in (13.1 aft to depend, locally, on the parameter. This generalization allows us 
some additional flexibility in finding an auxiliary variable dimension reduction. 

Start by fixing an arbitrary 9 G 0. As in Theorem Ql consider a pair of function 
(T,Hg ), depending on #0, such that x \-> (T(x),Hg (x)) is one-to-one. Now take the 
corresponding functions u \-> (tp T (u), ^^(w)), one-to-one, such that the baseline associ- 
ation, at 9 = 6q, can be decomposed as 

H 6o (X) = tfj H , eo (U) and T(X) = a T (MU),9 ). (6.3) 

That is, (16. 3p . with U ~ Py, describes the sampling distribution X ~ Px\e - Suppose 
H 8o (X) = h is observed. We can compute the conditional distribution Pv T \h ,e of Vt = 
ipr(U) given i/)h,6 (U) = h , which is then used to construct predictive random sets. 

From this point, we may proceed exactly as before. That is, for the A-step, we get 
sets Qt(x)(vt) = {9 : T(x) = cit(vt,9)} just like before. For the P-step, we pick a 
conditionally admissible predictive random set S ~ Ps\h ,e a - Finally, the C-step produces 
conditional plausibility function 

p\ T(x)M (A;S) = 1 - P s \ h0 ,9 {®T( x) (S) C A c }, ACQ. 

We shall refer to the corresponding conditional IM as a local conditional IM at 9 = 9q. 
The adjective "local" is meant to indicate the dependence of the construction on the 
particular point 9q. As we see below, the validity properties of this local conditional IM 
are, in a certain sense, also local. 
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6.3 Validity of local conditional IMs 



The following theorem shows that for each 9q value, the local conditional IM at 9q is valid 
for some important assertions depending on the particular 9q. The proof is exactly like 
that of Theorem |2] and, hence, omitted. 

Theorem 3. For any 9 , take h e Hg (X). Suppose that S ~ Ps\h ,e ^ s conditionally 
admissible. If Qt(x)(S) ^ with P 's\h ,e -probability 1 for all x such that Hg (x) = h , 
then the local conditional IM at 9$ is conditionally valid for A = {9q}, i.e., 

Px\e {p\ n x)\ h0 ,e (^S) < a | H 6o (X) = h } < a, Vae (0,1). 

The validity result here is not as strong as in Theorem El a consequence of the local- 
ization. It does, however, imply that the local conditional plausibility region, 

{0-p\T( x )\H„( x )(0]S)>a}, (6.4) 

has the nominal (conditional) 1 — a coverage probability. This theoretical result is con- 
firmed by the simulation experiment in Section l6\4l below. Observe that, in the definition 
of conditional plausibility region (16. 4p . the plausibility function depends on 9 in two 
places — in the argument (the assertion) and in the local conditional IM itself. The lat- 
ter structural dependence of the IM on the particular asser tion is consistent with the 



optimality developments described in lMartin and Liu! ( 12012al ). 



6.4 Bivariate normal model, revisited 

Here we demonstrate that the localization technique can be successfully used to solve 
the bivariate normal problem described above. Start with the relation in (16. ip . Fix 
6*0. To construct the functions (H,ip H ^ ) y depending on 9 , and the corresponding local 
conditional IM at 6*0, we shall modify the differential equation approach in Section | 
In this case, if we let u X) $ = (xi/(l + 6*),£ 2 /(l — $)) T ) then we have 

du x . e ( xi x 2 



09 

For a local conditional IM at 9q, we propose to choose a real- valued ipH,e {u) such that 
dipH,e (u x ,e) vanishes at 9 = 9q. If we take 

ipH,e (u) = (1 + 0o) log «i + (1 - 9 ) logu 2 , (6.5) 

then 

fajlHfiM /1 + 0Q 1-00 



du \ U\ U 2 

so the derivative of 4>H { u x,e) with respect to 9 is 

dip H fi {u x ,e) _ dip HA) (u) du. 



89 du 



"x,b 



08 



l + 9 ) 2 x, , (l-9 ) 2 x 2 



x x (1 + 9) 2 x 2 



[l + 9 ) 2 . (l-9 ) 2 



'1 - 
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The latter expression clearly evaluates to zero at 9 = 9 , so ipH,e satisfies the desired 
differential equation. The corresponding function H(x) = Hg (x) is given by 

H eo (x) = (1 + 9 ) \og{x 1 /(l + 9 )} + (1 - B ) log{x 2 /(l - 9 )}. 

For the local conditional association — the second expression in (16.31) — we take 

T(X) = z{9) + V T , 

where T(x) = \og(x 1 /x 2 ), z{9) = log{(l + 9)/{l - 9)}, and V T = T(U). Then 

is the conditional distribution of Vt, given (9o,ho), where ho is the observed Hg (X) = 

Hq {x). This conditional distribution has a density, given by 

f h0l6o (v T ) oc exp{-n9 v T /2 - coshK/2)e^-^/ 2 }. 

If we let F ho> g denote the corresponding distribution function, then (cf. Remark |2J) we 
can describe this conditional association model by 

T(X) = z{9) + F-\(W), W ~ Unif(0, 1). 

If, for the P-step, we use the predictive random set S in (12.21) . then the local conditional 
plausibility function is 



Pi 



T{x)\h ,< 



9 ;S) = 1 - |1 - 2F ho>6o (T{x) - z(9 ))\. 



A local conditional 100(1 — a)% plausibility interval for 9 can be found just as before, by 
thresholding the plausibility function at a. It follows from Theorem [3] that these intervals 
will have the nominal coverage probabilities. 

For illustration, we consider a simple simulation experiment. In particular, we com- 
pute the local conditional 95% plausibility interval for 9 in for 5000 Monte Carlo samples 
based on 9 = 0.3. For several values of n, the estimated coverage probabilities and ex- 
pected lengths are compared, in Table [2j to those of the conditional f requen tist i nterval 



based on the so-called "r*" 



approximation due to IB arndorff- Nielsen! (119861 ) and iFraser 



(Il990l ). summarized nicely in iReidl (119951 . 120031 ). As in Section [5TT1 the general message 
is that the conditional frequentist interval is a bit shorter than the generalized condi- 
tional IM plausibility interval on average, but the former tends to undershoot the target 
coverage probability 0.95 while the latter is on target for all n. That the frequentist in- 
tervals are conditioned on the observed value of an ancillary statistic makes them easier 
to interpret compared to an unconditional interval, but they still lack the probabilistic 
interpretation of the plausibility intervals. 



6.5 A variance-components example 

Suppose X±, . . . , X n are independent with Xi ~ N(0, l + u>j#), i — 1, . . . , n, where the u>j's 
are known constants (not all equal) and inference on the unknown 9 > is is desired. 
Such a model arises in an unbalanced hierarchical analysis of variance model framework: 
given fii, Yn, . . . , Y iw . are independent samples from N(/ij, 1), i = 1, . . . , n, and fi±, . . . , fi n 
are independent N(0, 9). In this case, Wi is the size of the zth group, i = 1, . . . ,n. Set 
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Coveraj 


ie probability 


Expected length 


n 


LCIM 


r* 


LCIM 


r* 


10 


0.951 


0.912 


0.961 


0.888 


25 


0.952 


0.935 


0.663 


0.621 


50 


0.946 


0.936 


0.480 


0.455 


100 


0.952 


0.943 


0.341 


0.328 


1000 


0.947 


0.938 


0.108 


0.104 


10000 


0.952 


0.943 


0.034 


0.033 



Table 2: Coverage probabilities and expected lengths of 95% plausibility and confidence 
intervals for the correlation 9 in the bivariate normal problem based on, re spectively, the 
local conditional IM (LCIM) and the r* approach reviewed by iReidl (120031 ). 



w 



1/2 



Yj., a rescaled version of the ith group mean. Then the marginal distribution 



of Xi is N(0, 1 + Wi6), the one assumed here. 

Although the model is relatively simple, inference on 9 remains a challenge, partly due 
to the fact that the primary question of interest is whether 9 = 0, a boundary point. In 
particular, the case 9 = corresponds to all the fi^s in the hierarchical framework being 
equal, i.e., none of the "treatments" are significant. Here we develop a local conditional 
IM approach for inference on 9, with focus on testing Ho : 9 = versus the full one-sided 
alternative : 9 > 0. Confid ence and Bayesian credib le intervals for 9 are developed in 
Zhang and Woodroofd (120021 ); see, also, iMartinl (120121 ). These latter papers focus on a 



balanced but more general version where the mean of /Vs and/or the variances of of the 
Yfj's are unknown but the "weights" Wi are all equal. These more general problems can 
be considered in the IM framework, b ut a special marginaliza tion technique is required 
which is beyond our present scope; see Martin and Liul (j2012bl ). 
Consider the following baseline association: 



Xf = (1 + Wi9)U u Z7 4 ~ ChiSq(l), 



1, . . . ,n. 



(6.6) 



To see that a local conditional IM is required, take, for the moment, the special case 
n = 2. Then a #-free function of observable data and unobservable auxiliary variables 
would be something like 

X\ _ w 2 Xf _ w 2 
U2 wi U\ w\ 

But the left-hand side of this display is clearly not of the separable form (13. lap , just 
as in the bivariate normal example described in Section 16.11 above. Therefore, a regular 
conditional IM may not be valid so we should look for a local conditional IM. 

To find a local conditional IM, we shall employ the PDE technique presented above. 
Start by writing the relationship ( 16. 6 p in terms of u for given x, 9: 

Ux,e,i = 1 r 1 o » i = l,...,n. 
1 + Wi9 

The derivative of this quantity with respect to 9 for fixed x is 



8Un 



WiXj 



Wi 



d9 



[i + w t 9y 



wS 



l,...,n. 
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The method of characteristics (jPolvanin et al.ll2002l ) for solving PDEs identifies the func- 
tion ipH,e{') from M. n to R n_1 , given by 

^Hfi{u)i = c i+1 (9) log u i+ x -Ci(9)\ogUi, i = l,...,n- 1, 

where Cj(0) = (l+Wi9)/wi. The derivative of this function with respect to u is a (n — 1) xn 
matrix, whose ith row looks like 



c*(0) c f +i(fl) 

0, 0, , ,0 , i = l n-l. 



(6.7) 



It is a simple calculation to check that, for each i = 1, . . . , n — 1, the row vector above, 
evaluated at « = is orthogonal to the vector 8u Xj g/89. Therefore, if we fix 9 = 9q, 
then ifjH,e (w) is indeed a solution to the PDE 







di>H,e {u X) e) dif) H ,e ( u ) 



U=U x a 89 



at 9 = 9 a . 



89 8u 

We then have a decomposition (16.31) of the baseline association (16. 6p given by 



log x? = M 1 + w * e ) + E lo s u * 



i=l 



1=1 



i=l 



for the T(X) = clt(?Pt(U),9) part, and 



c i+1 (9 ) log 



ty i+ iCj + i(0 o ) 



Cj (0 O ) log 



WiCi(6 ) 



ipH,e (U)i, i = l,...,n-l 



for the Hq {X) = ipH,e (U) part. Let VV = XT=i l°S^i- Then the conditional association 
can be written as 

T(X) = z(9) + V T , V t ~ P VT |, , ft0 

where T(X) = Y^7=i^ &X? and Py T |e ,/io i s ^ ne conditional distribution of Vt = iPt(U) 
given #o and the observed value Hq of ipH,e {U). Since the relationship between C/ and 
(ih(U),il>H,0o{U)) is log-linear, i.e., 

/ ^H,9 {U) X \ ( * \ ( logC/x \ 



Cn-l 



log C/ re _i 

y iog^ n y 



where the 1 X n vectors ci, . . . , c n _i are given in (I6.7|) . and the nth row is all l's, one can 
easily find (numerically) the joint density and, hence, the conditional density fe ,h ( v T) 
of Vt given ipH,e {U) = h n . We omit the details of this calculation here. Therefore, the 
conditional association can be rewritten as 

T(X) = z{9) + F e -\ o (R), R ~ Unif (0, 1), 

where F d0jho is the distribution function corresponding to fe 0t h - 
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This looks similar to the conditional association in the bivariate normal example. 
However, for certain assertions of interest, there is an additional obstacle to be overcome 
in this case, namely, "conflict cases." That is, T(X) and (R) can be arbitrary real 
numbers, but z(9) is non-negative, so not all pairs of (X, R) are compatible with the con- 
ditional association stated above. In other words, there is a special, almost imperceptible 
constraint which we must deal with. For pro blems with conflict cases, th ere is an efficient 



modification of the IM approach laid out in lErmini Leaf and Liul ( 120121 ) : fortunately, for 



the problem of interest here, we will not need this modification. 

In our case, we are interested in the assertion A = {0}, i.e., that the treatments are 
not significant. Since 6 = [0, oo), A c is a "one-sided" assertion and the arguments in 



Martin and Liul (l2012al . Theorem 4) suggest that the optimal predictive random set in 



this problem is S = [0, R], R ~ Unif (0, 1). In this case, we take #o = and we have 

O t (r) = {6:z(6)=t-F-l(r)}, t e R, r E [0, 1]. 
With the optimal predictive random set S, we then get 

Q t (S) = |J Q t (r) = {9 : z{9) > t - F^R)}, R ~ Unif (0, 1). 

res 

These random sets are non-empty with P^-probability 1, so we can effectively ignore the 
constraint mentioned above. But, if interest were in more general singleton assertions, 
say, for constructing a plausibility interval, then more care would be needed. 

With this construction, it is easy to check that the plausibility function at 9 = is 
pl t (0; S) = 1 — F 0iho (t), which can be evaluated numerically. A size-0.05 test of H : 9 = 
can be performed by rejecting H Q if pl t (0; S) < 0.05. A simulation study was performed to 
compare the power of this local conditional IM test with that of the parametric bootstrap 
likelihood ratio test. In our experiments, we found that the two tests had indistinguishable 
power functions. We believe this is a good sign. Recall that the local conditional IM is 
sacrificing something by focusing on validity only locally. But here we find that by 
choosing that particular point as the point of interest — in this case 9q = — we can 
maintain the expected strong performance of the conditional IM. Indeed, the parametric 
bootstrap likelihood ratio test is exact and arguably a gold-standard for efficiency. So the 
fact that the local conditional IM can match this gold-standard suggests that nothing is 
lost by focusing on a suitably chosen point 9q. 



7 Discussion 



In this paper we have extended the basic IM framework laid out in iMartin and Liu 



f l2012af ) by developing an auxiliary variable dimension reduction strategy. This reduction 
simultaneous accomplishes two goals. First, it provides a suitable combination of infor- 
mation across samples, and we argue in Remarks H] and [5] in Section 13.31 that Fisher's 
concept of sufficiency and Bayes' theorem can both be viewed as special cases of this 
combination of information via conditioning. Second, this reduction makes construction 
of efficient predictive random sets considerably simpler. An apparently new differential 
equation technique is proposed by which an auxiliary variable dimension reduction can 
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be found even in cases where traditional sufficiency considerations fail to give a satisfac- 
tory solution. In addition, as our simulation results in Sections 15.11 and 16.41 demonstrate, 
even with a default choice of predictive random set, the conditional IM results are as 
good or better than those using standard likelihood-based methods. This suggests that 
our proposed method of combining information is, in some sense, efficient. We expect 
that the conditional IM approach, paired with the optimal predictive random sets, will 
have even better performance. However, more work is needed since computation of these 
optimal predictive random sets remains a non-trivial task. 

The local conditional IMs considered in Section [6] are an important contribution. In- 
deed, these tools provide a means to reduce the effective dimension even in cases where the 
minimal sufficient statistic has dimension greater than that of the parameter. For exam- 
ple, in the variance-components problem in Section I6.5[ we identified a one-dimensional 
auxiliary variable to predict, even though there is no dimension reduction that can be 
achieved via sufficiency. The idea of focusing on validity locally at a single 9 = 6 itself 
seems to provide an improvement, this is, in fact, a special case of a more general idea. 
One could measure locality by a general assertion A, not necessarily a singleton A = {do}. 
In this way, one can develop a conditional IM that focuses on validity at a particular as- 
sertion A, thus extending the range of application of local conditional IMs. Though a 
clear picture of this general idea is not yet available, it is certainly within reach. 

The examples in this paper have focused on continuous distributions. Efficient infer- 
ence in discrete problems is challenging in any framework, and IMs are no different. For 
nice discrete problems, e.g., regular exponential families, the IM analysis described herein 
can be carried out without a hitch. However, when sufficiency consideration alone provide 
inadequate auxiliary variable dimension reduction, new tools are needed. In particular, 
the differential equation-based technique used above may not be applicable because the 
baseline association is based on inequalities rather than equalities. But perhaps by using 
discrete auxiliary variables, as opposed to continuous ones, it may be possible to re- 
fine this differential equation- driven technique for application in discrete data problems. 
Further investigation along these lines is needed. 

Finally, note that the problem considered here is when there is some sort of replication 
or information about a single quantity coming from multiple sources, e.g., several inde- 
pendent (noisy) measurements on the same quantity. In such cases, the goal is to combine 
the information coming from these different sources, and conditioning is shown to be the 
right tool for this sort of dimension reduction. In other problems, dimension reduction 
is needed because the real quantity of interest is some lower-dimensional characteristic 
of the full unknown parameter. For these nuisance parameter problems, a different sort 
of dimension r eduction is needed, and marginalization is the appropriate tool. The com- 
panion paper (iMartin and Liull2012bl ) deals with this problem from an IM point of view, 
i.e., with a focus on efficient prediction of unobservable auxiliary variables. 
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A Proofs, etc 

Proof of TheoremUl For given u G U, let (v t ,vh) = (jPt(u), ipuiu)). Let Q x (u) = {9 : 
x = a(u, 9)} be as defined in Section I2TTI and define 

®T( X) M = {9 : T{x) = a T (v T ,9)}. (A.l) 

Pick any AC0. It is clear that 

Q x (u) = Qt(x)M = or v H ^ H(x), 

e x (u)cA ^ e T(x) (v T )QA. 

By definition of conditional probability, and the assumed existence of the conditional 
distribution Pv t \h(x) for each x, the belief function bel x (A) for the baseline association 
with naive predictive random set S = {U}, U ~ Pu, can be re-expressed as 

be\ x (A) = Pu{e x (U) C A | O x (U) ^ 0} 

= r {v T ,v H ) 

= Pv t \h( X ){Qt( X )(v t ) c a i e r(a!) (y T ) ^ 0}, 

the latter quantity being the belief function for the naive IM from (T(x),clt,Pv t \h(x)) 
with predictive random set S = {Vr}, Vt ~ Pv T \H(x)- Since this holds for all x and all 
AC 0, the claimed equivalence follows. □ 

Lemma 1. Fix h G H(K) and take S with natural measure Ps\h as in Section^T^ Write 
Qs\h{ v T) = Ps\h{<S vt}- Then, for all h, Qs\h(Vr) is stochastically no larger than 
Unif(0,l) forV T ~ P Vr \ h . 

Proof. The goal is to show that Pv T \h{Qs\h(yr) > 1 — a} < a, for all a G (0, 1). Take 
any such a and set S a = f]{S G E>h : Pv T \h(S) > 1 — a}. Since Sh is nested, it follows that 
S a G E> h and Pv T \h(S a ) > 1 - a. By ([S3]), Ps\h{S C S^} > 1 - a. Since Qs^t) > 1 - a 
iff vt ^. S a , it follows that 

Py T |h{Q.s|h(^T) > 1 - a} = Py T |^a) = 1 - P VT \h(S a ) < a. 
The claim follows since h and a were arbitrary. □ 

Proof of Theorem^ Take any 9 G" A as the true value of the parameter; then T(X) = 
ot(Vt, with Vt ~ Py T |/i, characterizes the conditional distribution of X, given H(X) = 
h. Since A C {#} c , monotonicity of the belief function gives 

bel TW | & (A;5) < bel T(x) | h ({£} c ; S) = P 5 | h {e T(jr) (S) ^ 9} = Q s]h (V T ). 

Conditional admissibility of S implies that the right-hand side is stochastically no larger 
than Unif(0, 1). This, in turn, implies the same of the left-hand side be\T(X)\h{A',S), as a 
function of X ~ Px\e, given H(X) = h. Therefore, 

Px\ e {be\ T{ x)\h(A; S) > 1 - a \ H(X) = h) < P{Unif (0, 1) > 1 - a} = a. 

Taking supremum over 9 G" A proves (13.61) . □ 

Proof of Corollary Ql Since the distribution of S is free of h in this case, the belief function 
be\T(x)\h = belym is also free of h. Therefore, before taking supremum in the last line of 
the proof of Theorem [21 we can take expectation over h to remove the conditioning, so 
that the validity property holds unconditionally, like in (12. 6p . □ 
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